Skip to content

Add support for Apple Silicon#153

Open
ZardashtKaya wants to merge 1 commit into
yyfz:mainfrom
ZardashtKaya:main
Open

Add support for Apple Silicon#153
ZardashtKaya wants to merge 1 commit into
yyfz:mainfrom
ZardashtKaya:main

Conversation

@ZardashtKaya

Copy link
Copy Markdown

Pull Request: Enable MPS Support and Resolve Hardcoded CUDA Dependencies

Summary

This PR introduces support for Apple Silicon GPUs (MPS) and ensures the codebase is device-agnostic. It resolves AssertionError: Torch not compiled with CUDA enabled and NotImplementedError for specific operators on macOS, while maintaining full compatibility with CUDA and CPU backends.

Changes

1. Modernized Device Selection

  • Entry points (example.py, example_mm.py, example_vo.py, demo_gradio.py) now prioritize hardware in the following order: cuda > mps > cpu.
  • Added automatic detection of the best available device as the default option in argument parsers.

2. MPS Operator Fallback

  • Set PYTORCH_ENABLE_MPS_FALLBACK=1 in all entry point scripts. This ensures that operators not yet natively implemented in MPS (e.g., _upsample_bicubic2d_aa) automatically fall back to the CPU instead of crashing.

3. Device-Agnostic Autocast & Precision

  • Model Layers: Fixed hardcoded device_type='cuda' in torch.amp.autocast calls within Pi3, Pi3X, and camera_head.py. These now dynamically use the input tensor's device type.
  • Precision Handling: Implemented safe dtype selection:
    • CUDA: bfloat16 (if compute capability >= 8) or float16.
    • MPS: float16 (standard for Metal).
    • CPU: float32.

4. Memory Management

  • Updated Pi3XVO pipeline and Pi3X model to use torch.mps.empty_cache() when running on Apple Silicon, preventing memory fragmentation during long video processing.

5. Bug Fixes

  • Fixed a NameError in pi3/models/pi3x.py where imgs was referenced out of scope; replaced with hidden.device.type.
  • Corrected logic in demo_gradio.py to allow the model to load and run on non-CUDA systems.

Technical Details

  • Files Modified:
    • example.py, example_mm.py, example_vo.py: Updated device/dtype logic and added MPS fallback.
    • demo_gradio.py: Modernized device initialization and inference precision.
    • pi3/models/pi3.py, pi3/models/pi3x.py: Patched autocast and empty_cache.
    • pi3/models/layers/camera_head.py: Fixed hardcoded device_type.
    • pi3/pipe/pi3x_vo.py: Added device-agnostic autocast and cache clearing.

Verification Results

Verified on M1 Pro (16GB RAM):

  • example_mm.py runs successfully with --interval 50.
  • example.py runs successfully.
  • example_vo.py runs successfully.
  • demo_gradio.py initializes and loads the model on mps.

Copilot AI review requested due to automatic review settings May 15, 2026 02:22

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds Apple Silicon (MPS) support by making inference entry points and model autocast paths more device-aware while preserving CUDA/CPU execution.

Changes:

  • Adds CUDA/MPS/CPU device selection and per-device precision choices in examples and Gradio demo.
  • Replaces hardcoded CUDA autocast usage in model and pipeline code.
  • Adds MPS cache clearing and updates ignore patterns.

Reviewed changes

Copilot reviewed 8 out of 9 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
example.py Adds MPS fallback setup, device auto-selection, and device-aware autocast dtype.
example_mm.py Applies device-aware setup to multimodal inference.
example_vo.py Applies device-aware setup to VO pipeline entry point.
demo_gradio.py Allows Gradio inference/model loading on CUDA, MPS, or CPU.
pi3/models/pi3.py Makes disabled autocast block use the tensor device type.
pi3/models/pi3x.py Makes multimodal/head disabled autocast blocks use tensor device types.
pi3/models/layers/camera_head.py Removes hardcoded CUDA device type in camera head autocast block.
pi3/pipe/pi3x_vo.py Uses device-aware autocast and CUDA/MPS cache clearing.
.gitignore Updates ignored generated/local files.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread example.py
Comment on lines 1 to +7
import torch
import argparse
import os

# Set MPS fallback before importing other modules that might use torch
os.environ["PYTORCH_ENABLE_MPS_FALLBACK"] = "1"

Comment thread example_mm.py
Comment on lines +6 to +7
# Set MPS fallback before importing other modules that might use torch
os.environ["PYTORCH_ENABLE_MPS_FALLBACK"] = "1"
Comment thread example_vo.py
Comment on lines +6 to +7
# Set MPS fallback before importing other modules that might use torch
os.environ["PYTORCH_ENABLE_MPS_FALLBACK"] = "1"
Comment thread demo_gradio.py
Comment on lines +13 to +14
# Set MPS fallback
os.environ["PYTORCH_ENABLE_MPS_FALLBACK"] = "1"
Comment thread pi3/pipe/pi3x_vo.py
model_kwargs['with_prior'] = True

with torch.amp.autocast('cuda', dtype=dtype):
with torch.amp.autocast(chunk_imgs.device.type, dtype=dtype, enabled=chunk_imgs.device.type != 'cpu'):
@yyfz

yyfz commented May 17, 2026

Copy link
Copy Markdown
Owner

Thanks for the PR. The direction looks useful, but I don’t think it is ready to merge yet.

PYTORCH_ENABLE_MPS_FALLBACK is set after import torch in the entry scripts, so it may not take effect. Please move it before importing torch, or document that users should launch with PYTORCH_ENABLE_MPS_FALLBACK=1 python ....

Also, .gitignore seems to accidentally replace img_dir_to_video.py with img_dir_to_video.py.DS_Store; please fix that, ideally by adding a normal .DS_Store ignore rule.

After that, please confirm CUDA/CPU still work in addition to MPS.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants